Skip to content

Probability Perspective and Bayesian Perspective

Probability Perspective

The likelihood function is a function of the parameter w about the statistical model p(x;w)

Probability p(x;w) describes the distribution of the random variable x when the parameter w is fixed.

Likelihood p(x;w) describes the influence of different parameters w on the distribution of a known random variable x.

Maximum Likelihood Estimation (MLE)

Assume X1,X2,,Xn are samples from X, the probability of the observed samples occurrences is i=1np(xi;w).

Likelihood function

L(w)=L(x1,x2,,xn;w)=i=1np(xi,w)

Maximize the likelihood function to get w^

w^=argmaxwL(x1,x2,,xn;w)

where w^(x1,x2,,xn) is maximum likelihood estimation and w^(X1,X2,,Xn) is maximum likelihood statistic.

Log Likelihood Equation:

dL(w)dw=0d(logL(w))dw=0

Bayesian Perspective

When the training data is relatively small, overfitting will occur, resulting in inaccurate parameter estimation

Add prior (先验) knowledge to the parameters

Bayesian Learning: Consider the parameter w as a random variable. Objective: Given a set of observation data X, find the distribution p(w|X) of parameter w.

p(w|X) is also called posterior distribution (后验分布)

Bayesian rule The relationship between p(y|x) and p(x|y):

p(y|x)=p(x|y)p(y)p(x)p(w|x)p(x|w)p(w)

PosteriorLikelihood×Prior